Mixability in Statistical Learning
نویسندگان
چکیده
Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worst-case regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of log-loss, stochastic mixability reduces to a well-known (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1-loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability.
منابع مشابه
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F ,P) ERM can have slow (1/ √ n) or fast (1/n) rates of convergence of the excess risk as a function of the sa...
متن کاملFast rates in statistical and online learning
The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning — a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most...
متن کاملMixability is Bayes Risk Curvature Relative to Log Loss
Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multicl...
متن کاملFast rates with high probability in exp-concave statistical learning A Proofs for Stochastic Exp-Concave Optimization
This condition is equivalent to stochastic mixability as well as the pseudoprobability convexity (PPC) condition, both defined by Van Erven et al. (2015). To be precise, for stochastic mixability, in Definition 4.1 of Van Erven et al. (2015), take their Fd and F both equal to our F , their P equal to {P}, and ψ(f) = f∗; then strong stochastic mixability holds. Likewise, for the PPC condition, i...
متن کاملJoint Mixability of Elliptical Distributions and Related Families
In this paper, we further develop the theory of complete mixability and joint mixability, we provided sufficient conditions for certain univariate distributions with oneside unbounded support that are not joint mixable and complete mixable. An alternative proof to a result of Wang and Wang (2016) which related to the joint mixability of elliptical distributions with the same characteristic gene...
متن کامل